~ based on Chapter 3, Deep Learning by Ian Goodfellow et. al.
This note provides a quick glance to probability and information theory in their relations to deep learning. In particular, their theoretical roles, some common concepts and definitions as well as pointer to some futher readings.
Probability theory defines a set of rules for deriving uncertain statements and reason in presence of uncertainty. In the context of machine learning, its laws instruct how the AI systems should reason and behave. Thus, probability theory provides a theoretical framework for designing and reasoning about AI systems.
Information theory, on the other hand, allows us to quantify the amount of uncertainty in a probability distribution. TODO: ?Measure informative? Generalization?
Motivations
Probability Mass Function (PMF) describes a probability distribution over discrete random variables $X$, denoted $P(X)$.
Probability Density Function (PDF) describes a probability distribution over continuous random variables $X$, denoted $p(X)$.
Marginal probability is the probability distribution over the subset of random variables e.g. given the joint distribution $P(x, y)$, we usually interest in the marginal probability $P(x)$
Conditional probability is the probability of some event, given that some other event has happened
Chain rule of conditional probabilities
Expectation (expected value) is the average or mean value that $f$ takes on when $x$ is drawn from $P$
Variance measures how much the values of $f(x)$ vary as we sample different values of $x$ from P
Covariance describes how much two values are linearly related to each other